智能论文笔记

Activation Learning by Local Competitions

Hongchao Zhou

分类：神经与进化计算 | 人工智能 | 计算机视觉 | 机器学习

2022-09-26

驱动深度学习成功的反向传播很可能与大脑的学习机制不同。在本文中，我们制定了一项受生物学启发的学习规则，该规则在HEBB著名的建议的想法之后，发现了当地竞争的特征。已经证明，该本地学习规则所学的无监督功能可以作为培训模型，以提高某些监督学习任务的绩效。更重要的是，该本地学习规则使我们能够构建一个与返回传播完全不同的新学习范式，该范式命名为激活学习，其中神经网络的输出激活大致衡量了输入模式的可能性。激活学习能够从几乎没有输入模式的几镜头中学习丰富的本地特征，并且当训练样本的数量相对较小时，比反向传播算法表现出明显更好的性能。这种学习范式统一了无监督的学习，监督的学习和生成模型，并且更安全地抵抗对抗性攻击，为建立一般任务神经网络的某些可能性铺平了道路。

translated by 谷歌翻译

Location reference recognition from texts: A survey and comparison

Xuke Hu , Zhiyong Zhou , Hao Li , Yingjie Hu , Fuqiang Gu , Jens Kersten , Hongchao Fan , Friederike Klan

分类：自然语言处理

2022-07-04

非结构化的文本中存在大量的位置信息，例如社交媒体帖子，新闻报道，科学文章，网页，旅行博客和历史档案。地理学是指识别文本中的位置参考并识别其地理空间表示的过程。虽然地理标准可以使许多领域受益，但仍缺少特定应用程序的摘要。此外，缺乏对位置参考识别方法的现有方法的全面审查和比较，这是地理验证的第一个和核心步骤。为了填补这些研究空白，这篇综述首先总结了七个典型的地理应用程序域：地理信息检索，灾难管理，疾病监视，交通管理，空间人文，旅游管理和犯罪管理。然后，我们通过将这些方法分类为四个组，以基于规则的基于规则，基于统计学学习的基于统计学学习和混合方法将这些方法分类为四个组，从而回顾了现有的方法参考识别方法。接下来，我们彻底评估了27种最广泛使用的方法的正确性和计算效率，该方法基于26个公共数据集，其中包含不同类型的文本（例如，社交媒体帖子和新闻报道），包含39,736个位置参考。这项彻底评估的结果可以帮助未来的方法论发展以获取位置参考识别，并可以根据应用需求指导选择适当方法的选择。

translated by 谷歌翻译

TAToo: Vision-based Joint Tracking of Anatomy and Tool for Skull-base Surgery

Zhaoshuo Li , Hongchao Shu , Ruixing Liang , Anna Goodridge , Manish Sahu , Francis X. Creighton , Russell H. Taylor , Mathias Unberath

分类：计算机视觉 | 人工智能

2022-12-29

Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1{\deg}. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base.

translated by 谷歌翻译

Attributes Guided Feature Learning for Vehicle Re-identification

Hongchao Li , Xianmin Lin , Aihua Zheng , Chenglong Li , Bin Luo , Ran He , Amir Hussain

分类：计算机视觉

2019-05-22

由于其在智能城市和城市监测中的潜在应用，车辆重新ID最近引起了热烈的关注。然而，它遭受了通过观察变化和照明变化引起的大型阶级变化，以及阶级相似性，特别是对于具有类似外观的不同标识。为了处理这些问题，在本文中，我们提出了一种新颖的深度网络架构，其由有意义的属性引导，包括相机视图，车辆类型和用于车辆RE-ID的颜色。特别是，我们的网络是端到端训练的，并包含由相应属性嵌入的深度特征的三个子网（即，相机视图，车辆类型和车辆颜色）。此外，为了克服不同视图的有限载体图像的缺点，我们设计了一个视图指定的生成的对抗性网络来生成多视图车辆图像。对于网络培训，我们在Veri-776数据集上注释了视图标签。请注意，只能使用ID信息直接在其他数据集上直接在其他数据集上采用预先训练的视图（以及类型和颜色）子网，这展示了我们模型的泛化。基准数据集Veri-776和车辆的广泛实验表明，拟议的方法实现了有希望的性能，并对车辆重新ID的新型最先进的性能。

translated by 谷歌翻译

Cluster-guided Contrastive Graph Clustering Network

Xihong Yang , Yue Liu , Sihang Zhou , Siwei Wang , Wenxuan Tu , Qun Zheng , Xinwang Liu , Liming Fang , En Zhu

分类：机器学习

2023-01-03

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

Zhongtao Chen , Chenghu Mi , Siwei Duo , Jingfei He , Yatong Zhou

分类：自然语言处理

2023-01-03

Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.

translated by 谷歌翻译

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jie Liu , Yixiao Zhang , Jie-Neng Chen , Junfei Xiao , Yongyi Lu , Bennett A. Landman , Yixuan Yuan , Alan Yuille , Yucheng Tang , Zongwei Zhou

分类：计算机视觉 | 机器学习

2023-01-02

An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.

translated by 谷歌翻译

PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis

Hong-Yu Zhou , Chixiang Lu , Chaoqi Chen , Sibei Yang , Yizhou Yu

分类：计算机视觉 | 机器学习

2023-01-02

Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.

translated by 谷歌翻译

Credible Remote Sensing Scene Classification Using Evidential Fusion on Aerial-Ground Dual-view Images

Kun Zhao , Qian Gao , Siyuan Hao , Jie Sun , Lijian Zhou

分类：计算机视觉 | 人工智能

2023-01-02

Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.

translated by 谷歌翻译